XAI-Kit Installation Guide¶

XAI-Kit is a Python library for Explainable Artificial Intelligence (XAI). Follow these steps to install the XAI-Kit library.

You can install XAI-Kit via pip, and for this example, we'll install version 0.0.6.

## How to Install the library:
pip install xai-kit        # Method 1
pip install xai-kit==0.0.7 # Method 2

How to Import and Use XAI-Kit¶

XAI-Kit is a powerful library for Explainable Artificial Intelligence (XAI). In this section, we'll show you how to import the library and some of its essential components.

Importing the XAI-Kit Library¶

You can import XAI-Kit and its submodules as follows:

import xai

This line imports the entire XAI-Kit library, making its functionalities available for use.

Loading Datasets¶

XAI-Kit provides built-in datasets for regression tasks. Here are examples of how to load two different datasets:

1. Admission Prediction Dataset¶

from xai.datasets.regression_data import load_admission_prediction_data

This dataset is used for regression tasks related to admission prediction.

2. Bike Sharing Demand Dataset¶

from xai.datasets.regression_data import load_bike_sharing_demand_data

The bike sharing demand dataset is another built-in dataset used for regression analysis.

Loading Regression Models and Data¶

XAI-Kit also includes pre-built regression models and corresponding datasets. You can load these models and data with the following commands:

from xai.regression.models import load_linear_regression_model_and_data
from xai.regression.models import load_lasso_regression_model_and_data
from xai.regression.models import load_random_forest_model_and_data
from xai.regression.models import load_support_vector_model_and_data

These commands load different regression models along with the datasets suitable for each model.

Visualization Tools¶

XAI-Kit offers a variety of visualization tools for regression analysis. Here are some of the visualization functions you can use:

from xai.regression.visualizations import create_two_column_scatter_plot
from xai.regression.visualizations import create_interactive_scatter_plot
from xai.regression.visualizations import create_3d_scatter_plot
from xai.regression.visualizations import create_3d_scatter_dashboard
from xai.regression.visualizations import create_residual_plots
from xai.regression.visualizations import create_feature_importance_plot
from xai.regression.visualizations import create_actual_vs_predicted_distribution
from xai.regression.visualizations import create_qq_plot
from xai.regression.visualizations import create_residual_plot_with_shapley
from xai.regression.visualizations import visualize_regression_metrics
from xai.regression.visualizations import visualize_advanced_regression_metrics

These functions provide various visualizations and tools to help you analyze regression models, datasets, and their performance.

Now that you know how to import the library and access its functionalities, you can start using XAI-Kit for your explainable AI needs!

How to Get Example Datasets for Regression Analysis¶

If you don't have external datasets, you can use these examples

Option 1: Load the Admission Prediction dataset¶

In [9]:
# Option 1: Load the Admission Prediction dataset
from xai.datasets.regression_data import load_admission_prediction_data

# Use the `load_admission_prediction_data` function to load the dataset
df = load_admission_prediction_data()

# Check the data types of columns in the dataset
df.dtypes

# Display the last few rows of the loaded dataset to inspect its content
df.head()
Out[9]:
GRE Score TOEFL Score University Rating Statement of Purpose Letter of Recommendation CGPA Research Chance of Admit
0 337 118 4 4.5 4.5 9.65 1 0.92
1 324 107 4 4.0 4.5 8.87 1 0.76
2 316 104 3 3.0 3.5 8.00 1 0.72
3 322 110 3 3.5 2.5 8.67 1 0.80
4 314 103 2 2.0 3.0 8.21 0 0.65

Explanation for Option 1:

  • We are importing the load_admission_prediction_data function from xai.datasets.regression_data.
  • We use this function to load the Admission Prediction dataset into the DataFrame df.
  • To inspect the contents of the loaded dataset, we display the last few rows using df.tail().
  • We also check the data types of columns in the dataset using df.dtypes.

Option 2: Load the Bike Sharing Demand dataset¶

In [8]:
# Option 2: Load the Bike Sharing Demand dataset
from xai.datasets.regression_data import load_bike_sharing_demand_data

# Use the `load_bike_sharing_demand_data` function to load the dataset
df = load_bike_sharing_demand_data()

# Check the data types of columns in the dataset
df.dtypes

# Display the last few rows of the loaded dataset to inspect its content
df.tail()
Out[8]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
10881 2012-12-19 19:00:00 4 0 1 1 15.58 19.695 50 26.0027 7 329 336
10882 2012-12-19 20:00:00 4 0 1 1 14.76 17.425 57 15.0013 10 231 241
10883 2012-12-19 21:00:00 4 0 1 1 13.94 15.910 61 15.0013 4 164 168
10884 2012-12-19 22:00:00 4 0 1 1 13.94 17.425 61 6.0032 12 117 129
10885 2012-12-19 23:00:00 4 0 1 1 13.12 16.665 66 8.9981 4 84 88

Explanation for Option 2:

  • We are importing the load_bike_sharing_demand_data function from xai.datasets.regression_data.
  • We use this function to load the Bike Sharing Demand dataset into the DataFrame df.
  • To inspect the contents of the loaded dataset, we display the last few rows using df.tail().
  • We also check the data types of columns in the dataset using df.dtypes.

How to Load Different Regression Models¶

In case you don't have pre-trained models, you can use these examples to load models and their associated data.

These options provide you with the capability to load various regression models and their associated data for your analysis, even if you don't have pre-trained models.

Option 1: Linear Regression¶

In [ ]:
# To load a Linear Regression model, use the following import statement:
from xai.regression.models import load_linear_regression_model_and_data

# Next, utilize the `load_linear_regression_model_and_data` function to load the model and data:
X_train, y_train, X_test, y_test, model = load_linear_regression_model_and_data()

# You can inspect the loaded data:
# Display the training input data
X_train

# Display the training target data
y_train

# Display the testing input data
X_test

# Display the testing target data
y_test

# The loaded linear regression model can also be examined:
model

# Additionally, you can view the coefficients of the linear regression model:
model.coef_

Option 2: Lasso Regression¶

In [ ]:
# To load a Lasso Regression model, use the following import statement:
from xai.regression.models import load_lasso_regression_model_and_data

# Similar to Option 1, you can load the model and data, and inspect them as needed.

Option 3: Support Vector Machines (SVM)¶

In [ ]:
# To load a Support Vector Machines (SVM) model, use the following import statement:
from xai.regression.models import load_support_vector_model_and_data

# Again, you can load the model and data, and inspect them as required.

Option 4: Random Forest Regressor¶

In [ ]:
# To load a Random Forest Regressor model, use the following import statement:
from xai.regression.models import load_random_forest_model_and_data

# As with previous options, you can load the model and data, and examine them as necessary.

How to Use the Functions with Your Dataset and Models¶

Note: You can use the provided functions with your custom dataset and models, or with the datasets and models included in the library.

1. Function: create_two_column_scatter_plot¶

Goal:¶

The create_two_column_scatter_plot function helps users explore and understand the relationship between two columns or features in a dataset, particularly in the context of regression analysis.

Benefits in Regression Analysis:¶

  • Visual Exploration: Users can visually assess how changes in one variable (predictor) relate to another (target), aiding in model building and interpretation.

  • Pattern Detection: Scatter plots reveal patterns and correlations in the data, such as linear or nonlinear relationships.

  • Outlier Identification: Outliers, which can impact model accuracy, are easily spotted.

  • Model Validation: Users can validate regression models by comparing predictions to observed data.

  • Customization: The function allows customization for creating publication-ready plots.

  • Applicability: It can be applied to various regression tasks, adapting to custom datasets and models.

In summary, create_two_column_scatter_plot enhances regression analysis by providing a visual tool for pattern detection, outlier identification, model validation, and customization.

In [3]:
# Example 1:

# First, import the necessary modules and functions from the library:
from xai.datasets.regression_data import load_admission_prediction_data
from xai.regression.models import load_linear_regression_model_and_data
from xai.regression.visualizations import create_two_column_scatter_plot

# Next, load your dataset and model (or use one from the library):
X_train, y_train, X_test, y_test, model = load_linear_regression_model_and_data()
df = load_admission_prediction_data()

# Use the create_two_column_scatter_plot function to create a scatter plot:
create_two_column_scatter_plot(model, df, 'GRE Score', 'Chance of Admit', figsize=(16, 12))
Out[3]:
<module 'matplotlib.pyplot' from 'C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python310\\lib\\site-packages\\matplotlib\\pyplot.py'>
In [4]:
# Example 2:

# Import additional libraries and modules for generating synthetic data and training a model:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from xai.regression.visualizations import create_two_column_scatter_plot

# Generate synthetic data for demonstration:
np.random.seed(0)
X = np.random.rand(100, 1) * 10
y = 2 * X + 1 + np.random.randn(100, 1)

# Create a pandas DataFrame from the synthetic data:
df = pd.DataFrame({'X': X.flatten(), 'y': y.flatten()})

# Create and train a simple linear regression model:
model = LinearRegression()
model.fit(df[['X']], df['y'])

# Specify customization options for the scatter plot:
title = "Scatter Plot: X vs. y"
xlabel = "X-axis"
ylabel = "y-axis"
figsize = (16, 12)
save_path = "scatter_plot.png"  # Optional: Specify a file path to save the plot

# Use the create_two_column_scatter_plot function to create the scatter plot with improved aesthetics:
fig = create_two_column_scatter_plot(model, df, 'X', 'y', title=title, xlabel=xlabel, ylabel=ylabel, figsize=figsize, save_path=save_path)

2. Function: create_interactive_scatter_plot¶

Goal: The goal of the create_interactive_scatter_plot function is to provide an interactive scatter plot for regression analysis.

Benefits:

  1. Visual Exploration: Users can visually explore relationships between input features and the target variable.

  2. Model Assessment: It helps users assess how well their regression model captures the data patterns.

  3. Data Inspection: Users can inspect data points and model predictions interactively.

  4. Decision Support: It supports data-driven decision-making by visualizing regression results effectively.

Overall, this function empowers users to gain insights from their regression model and dataset through an interactive and intuitive visualization.

In [27]:
# Import necessary libraries and functions
from xai.datasets.regression_data import load_admission_prediction_data
from xai.regression.models import load_linear_regression_model_and_data
from xai.regression.visualizations import create_interactive_scatter_plot

# Load a regression dataset (e.g., admission prediction data)
df = load_admission_prediction_data()

# Load or train a linear regression model and its associated training data
X_train, y_train, X_test, y_test, model = load_linear_regression_model_and_data()

# Create an interactive scatter plot for regression analysis
# Parameters:
# - model: The trained regression model.
# - df: The DataFrame containing dataset and predictions.
# - feature_columns: List of feature column names.
# - target_column: The name of the target variable.
app = create_interactive_scatter_plot(model, df, X_train.columns.tolist(), y_train.columns.tolist()[0])
C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\dash\resources.py:61: UserWarning:

You have set your config to `serve_locally=True` but A local version of https://codepen.io/chriddyp/pen/bWLwgP.css is not available.
If you added this file with `app.scripts.append_script` or `app.css.append_css`, use `external_scripts` or `external_stylesheets` instead.
See https://dash.plotly.com/external-resources

C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\dash\resources.py:61: UserWarning:

You have set your config to `serve_locally=True` but A local version of https://codepen.io/chriddyp/pen/bWLwgP.css is not available.
If you added this file with `app.scripts.append_script` or `app.css.append_css`, use `external_scripts` or `external_stylesheets` instead.
See https://dash.plotly.com/external-resources

Explanation:

  1. We import the necessary libraries and functions required for this example.

  2. We load a regression dataset (in this case, the admission prediction data) using load_admission_prediction_data.

  3. We load or train a linear regression model and prepare its associated training data using load_linear_regression_model_and_data. This function provides the X_train, y_train, X_test, y_test, and model variables.

  4. We use the create_interactive_scatter_plot function to generate an interactive scatter plot for regression analysis. The parameters provided include the trained model, the DataFrame containing the dataset and model predictions (df), the list of feature column names (X_train.columns.tolist()), and the name of the target variable (y_train.columns.tolist()[0]).

This interactive scatter plot is valuable for exploring relationships between features and the target variable in regression tasks, enabling users to interactively analyze data points and model predictions.

3. Function: create_3d_scatter_plot¶

Goal: The goal of this code is to visualize a 3D scatter plot to explore the relationships between two input features and a target variable in a regression analysis.

Benefits:

  1. Visual Exploration: The 3D scatter plot helps users visually explore how two input features (Column1 and Column2) interact with the target variable (TargetColumn).

  2. Model Evaluation: It allows users to evaluate how well their regression model captures the data patterns in a three-dimensional space.

  3. Insight Generation: Users can gain insights into the data distribution and potential correlations between variables.

In [17]:
# Import necessary libraries and modules
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import plotly.offline as pyo
from xai.regression.visualizations import create_3d_scatter_plot

# Sample data generation
data = {
    'Column1': np.random.rand(100),
    'Column2': np.random.rand(100),
    'TargetColumn': 2 * np.random.rand(100) + 3,
}

# Create a DataFrame from the generated data
df = pd.DataFrame(data)

# Prepare data for regression modeling
X = df[['Column1', 'Column2']]
y = df['TargetColumn']

# Initialize and train a linear regression model
model = LinearRegression()
model.fit(X, y)

# Create the 3D scatter plot using the 'create_3d_scatter_plot' function
fig = create_3d_scatter_plot(model, df, 'Column1', 'Column2', 'TargetColumn')

# Show the plot in an interactive HTML file
pyo.plot(fig, filename='3d_scatter_plot.html')
C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\base.py:409: UserWarning:

X does not have valid feature names, but LinearRegression was fitted with feature names

C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\base.py:409: UserWarning:

X does not have valid feature names, but LinearRegression was fitted with feature names

Out[17]:
'3d_scatter_plot.html'

This code snippet generates synthetic data, trains a linear regression model, and then creates an interactive 3D scatter plot to visualize the relationships between Column1, Column2, and the TargetColumn. It is a valuable tool for gaining insights and assessing the regression model's performance in a three-dimensional space.

4. Function: create_3d_scatter_dashboard¶

Goal: The goal of this code is to create an interactive dashboard that displays a 3D scatter plot. This dashboard allows users to explore and interact with the relationships between two input features and a target variable in a regression analysis.

Benefits:

  1. Interactive Exploration: Users can interactively explore and analyze the relationships between the input features (Column1 and Column2) and the target variable (TargetColumn) in a three-dimensional space.

  2. Model Evaluation: The dashboard provides a dynamic way to evaluate how well a regression model captures the data patterns.

  3. Data Insight: Users can gain insights into the data distribution and identify potential correlations between variables through an intuitive interface.

In [13]:
# Import necessary libraries and modules
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from xai.regression.visualizations import create_3d_scatter_dashboard
# Sample data generation
data = {
    'Column1': np.random.rand(100),
    'Column2': np.random.rand(100),
    'TargetColumn': 2 * np.random.rand(100) + 3,
}

# Create a DataFrame from the generated data
df = pd.DataFrame(data)

# Define the names of X columns (input features) and the Y column (target variable)
x_columns = ['Column1', 'Column2']
y_column = 'TargetColumn'

# Sample model (replace with your own model)
model = LinearRegression()

# Train the model on your data
X = df[x_columns].values
y = df[y_column].values
model.fit(X, y)  # Fit the model to your data

# Create the Dash app for the 3D scatter dashboard
app = create_3d_scatter_dashboard(df, x_columns, y_column, model)
# Pass the DataFrame, X columns, Y column, and trained model to create the interactive dashboard.

This code snippet generates synthetic data, trains a linear regression model, and creates an interactive 3D scatter dashboard for visualizing and exploring relationships between Column1, Column2, and the TargetColumn. Users can interact with the dashboard to gain insights and evaluate the model's performance in a three-dimensional space.

5. Function: create_residual_plots¶

Goal: The goal of this code is to create residual plots for a regression model. Residual plots help users visualize the differences between the actual target values and the predicted values, allowing them to assess the model's performance.

Benefits:

  1. Model Evaluation: Residual plots are a fundamental tool for evaluating the performance of regression models. They help users identify patterns or trends in the model's errors.

  2. Assumption Checking: Residual plots can be used to check whether the underlying assumptions of linear regression, such as the normality of residuals and homoscedasticity, are met.

  3. Outlier Detection: Users can identify potential outliers or influential data points by examining the residuals.

In [14]:
# Import necessary libraries and modules
from xai.datasets.regression_data import load_admission_prediction_data
from xai.regression.models import load_lasso_regression_model_and_data
from xai.regression.visualizations import create_residual_plots

# Load the Lasso regression model and associated data
X_train, y_train, X_test, y_test, model = load_lasso_regression_model_and_data()
# This loads the Lasso regression model and splits the data into training and testing sets.

# Load the admission prediction dataset
df = load_admission_prediction_data()
# This loads the admission prediction dataset for analysis.

# Create residual plots using the provided model and dataset
fig = create_residual_plots(model, df, 'Chance of Admit')
# This function generates residual plots for the 'Chance of Admit' target variable and the associated model.

The code initializes a Lasso regression model, loads an admission prediction dataset, and then calls the create_residual_plots function to create residual plots specifically for the 'Chance of Admit' target variable. These plots help users evaluate the model's performance, check assumptions, and detect potential outliers.

6. Function: create_feature_importance_plot¶

Goal: The goal of this code is to create a feature importance plot for a regression model. Feature importance plots help users understand which features have the most significant impact on the model's predictions.

Benefits:

  1. Feature Insights: Users can gain insights into the relative importance of different features in their regression model. This information can be crucial for understanding which variables are driving the model's predictions.

  2. Model Understanding: Feature importance plots contribute to better model interpretability by highlighting the most influential features. This aids in explaining the model's behavior to stakeholders.

  3. Feature Selection: Feature importance can guide feature selection efforts. Users may choose to focus on the most important features for model simplification or improved performance.

In [15]:
# Import necessary libraries and modules
from xai.datasets.regression_data import load_bike_sharing_demand_data
from xai.regression.models import load_support_vector_model_and_data
from xai.regression.visualizations import create_feature_importance_plot

# Load the support vector regression model and associated data
X_train, y_train, X_test, y_test, model = load_support_vector_model_and_data()
# This loads the support vector regression model and splits the data into training and testing sets.

# Load the bike sharing demand dataset
df = load_bike_sharing_demand_data()
# This loads the bike sharing demand dataset for analysis.

# Extract feature importances (in this case, coefficients) and feature names
feature_importance = model.coef_.tolist()[0]
# Here, the feature importances are extracted from the model's coefficients.
feature_names = X_train.columns.tolist()
# The feature names are extracted from the training data columns.

# Create a feature importance plot using the extracted data
fig = create_feature_importance_plot(feature_importance, feature_names, width=1400, height=800)
# This function generates a feature importance plot with custom width and height settings.

# Show the feature importance plot
fig.show()
# The generated feature importance plot is displayed.

The code first loads a support vector regression model, loads a bike sharing demand dataset, extracts feature importances (coefficients), and feature names. Then, it calls the create_feature_importance_plot function to create the feature importance plot with customizable width and height settings. This plot helps users understand which features contribute most to the model's predictions.

7. Function: create_actual_vs_predicted_distribution¶

Goal: The goal of this code is to create a visualization that compares the actual values with the predicted values from a regression model. This plot helps users assess how well the model's predictions align with the true values.

Benefits:

  1. Model Evaluation: Users can visually assess how well the model performs by comparing its predictions to the actual outcomes. This is essential for understanding the model's accuracy and reliability.

  2. Residual Analysis: The plot provides insights into the distribution of errors (residuals) between the predicted and actual values, helping users identify patterns or anomalies in the model's performance.

  3. Performance Tuning: Users can use the plot to identify areas where the model performs well or poorly, which can guide further model improvement or feature engineering efforts.

In [18]:
# Import necessary functions and libraries
from xai.datasets.regression_data import load_bike_sharing_demand_data
from xai.regression.models import load_support_vector_model_and_data
from xai.regression.visualizations import create_actual_vs_predicted_distribution

# Load the trained support vector regression model and data
X_train, y_train, X_test, y_test, model = load_support_vector_model_and_data()
# This loads a trained support vector regression model and its associated training and testing data.

# Load the bike sharing demand dataset
df = load_bike_sharing_demand_data()
# This loads the bike sharing demand dataset for analysis.

# Use the model to make predictions on the test data
predicted = model.predict(X_test)
# The model is used to make predictions on the test data.

# Create an actual vs. predicted distribution plot
fig = create_actual_vs_predicted_distribution(
    actual=y_test['count'].tolist(),  # Actual values from the dataset
    predicted=predicted.tolist(),  # Predicted values from the model
    height=800,  # Custom height for the plot
    width=1400  # Custom width for the plot
)
# This function generates the actual vs. predicted distribution plot with specified height and width.

# Show the plot
fig.show()
# The generated plot is displayed, allowing users to visually assess the model's performance.

This code loads a trained support vector regression model, uses it to make predictions on the test data, and then creates an actual vs. predicted distribution plot. The plot visualizes the relationship between the actual and predicted values, aiding in model evaluation and performance analysis.

8. Function: create_qq_plot¶

Goal: The goal of this code is to create a QQ plot to assess whether the residuals (differences between actual and predicted values) of a regression model follow a normal distribution. This plot helps users determine whether their model's residuals exhibit a normal distribution pattern, which is an assumption of many statistical tests.

Benefits:

  1. Normality Assessment: The QQ plot visually compares the distribution of residuals to a theoretical normal distribution. Deviations from the straight line suggest departures from normality.

  2. Model Assumption Checking: Users can use this plot to verify one of the essential assumptions of linear regression and other statistical methods. If residuals follow a normal distribution, it ensures the validity of statistical inferences.

  3. Outlier Detection: The plot can reveal outliers and skewness in the residuals, which may indicate data quality issues or model limitations.

In [19]:
# Import necessary functions and libraries
from xai.datasets.regression_data import load_bike_sharing_demand_data
from xai.regression.models import load_random_forest_model_and_data
from xai.regression.visualizations import create_qq_plot

# Load the trained random forest regression model and data
X_train, y_train, X_test, y_test, model = load_random_forest_model_and_data()
# This loads a trained random forest regression model and its associated training and testing data.

# Load the bike sharing demand dataset
df = load_bike_sharing_demand_data()
# This loads the bike sharing demand dataset for analysis.

# Use the model to make predictions on the test data
predicted = model.predict(X_test)
# The model is used to make predictions on the test data.

# Calculate the residuals (actual - predicted)
residuals = y_test['count'] - predicted
# Residuals represent the differences between the actual and predicted values.

create_qq_plot(residuals)
# This function generates a QQ plot to assess the normality of residuals and visualizes how well they align with a theoretical normal distribution.

This code loads a trained random forest regression model, calculates residuals, and creates a QQ plot to assess the normality of these residuals. The QQ plot helps users evaluate the assumption of normality in regression analysis.

9. Function: create_residual_plot_with_shapley¶

Goal: The goal of this code is to create a residual plot with Shapley values for assessing how the Shapley values of individual features impact the model's residuals. It helps users understand which features contribute to errors in the model's predictions.

Benefits:

  1. Interpretability: This plot provides a visual representation of how the contributions of individual features influence the model's predictions. It aids in interpreting the impact of each feature on prediction errors.

  2. Error Analysis: Users can identify which features are associated with larger prediction errors, helping to diagnose model limitations and potential data issues.

  3. Model Improvement: Understanding feature contributions to residuals can guide model improvement efforts. Users can focus on mitigating the impact of features causing significant errors.

In [20]:
# Import necessary functions and libraries
from xai.datasets.regression_data import load_bike_sharing_demand_data
from xai.regression.models import load_random_forest_model_and_data
from xai.regression.visualizations import create_residual_plot_with_shapley

# Load the trained random forest regression model and data
X_train, y_train, X_test, y_test, model = load_random_forest_model_and_data()
# This loads a trained random forest regression model and its associated training and testing data.

# Load the bike sharing demand dataset
df = load_bike_sharing_demand_data()
# This loads the bike sharing demand dataset for analysis.

create_residual_plot_with_shapley(model, X_test, y_test['count'].tolist())
# This function generates a residual plot with Shapley values to visualize how individual features impact the model's residuals. It uses the trained model and test data to create this plot.

This code allows users to create a residual plot with Shapley values, providing insights into feature contributions to prediction errors and aiding in model interpretation and improvement.

10. Function: create_residual_plot_with_shapley¶

Goal: The goal of this code is to visualize regression metrics to assess the performance of a regression model. It helps users understand how well the model's predictions align with the actual target values.

Benefits:

  1. Performance Assessment: Users can visualize various regression metrics to evaluate the model's accuracy and precision in predicting continuous target values.

  2. Model Comparison: This visualization allows users to compare different regression models or model configurations based on their performance metrics.

  3. Decision Making: By examining metrics like mean absolute error (MAE), mean squared error (MSE), and R-squared, users can make informed decisions about the model's suitability for their specific task.

In [21]:
# Import necessary functions and libraries
from xai.datasets.regression_data import load_admission_prediction_data
from xai.regression.models import load_linear_regression_model_and_data
from xai.regression.visualizations import visualize_regression_metrics

# Load the trained linear regression model and data
X_train, y_train, X_test, y_test, model = load_linear_regression_model_and_data()
# This loads a trained linear regression model and its associated training and testing data.

# Load the admission prediction dataset
df = load_admission_prediction_data()
# This loads the admission prediction dataset for analysis.

visualize_regression_metrics(y_test, model.predict(X_test))
# This function visualizes various regression metrics based on the actual target values (y_test) and the model's predictions (model.predict(X_test)).

This code allows users to visually assess the performance of a regression model using metrics such as MAE, MSE, and R-squared, aiding in model evaluation and decision-making.

11. Function: create_residual_plot_with_shapley¶

Goal: The goal of this code is to visualize advanced regression metrics to provide a deeper understanding of a regression model's performance. It helps users gain insights into the model's predictive accuracy, distribution of residuals, and feature importances.

Benefits:

  1. In-Depth Analysis: Users can gain a more comprehensive view of the model's performance beyond basic metrics like mean squared error (MSE) and R-squared.

  2. Residual Distribution: This visualization shows the distribution of residuals, helping users understand whether the model's errors are normally distributed.

  3. Feature Importance: Users can assess the importance of each feature in predicting the target variable, aiding in feature selection and model interpretation.

In [22]:
# Import necessary functions and libraries
from xai.datasets.regression_data import load_admission_prediction_data
from xai.regression.models import load_linear_regression_model_and_data
from xai.regression.visualizations import visualize_advanced_regression_metrics

# Load the trained linear regression model and data
X_train, y_train, X_test, y_test, model = load_linear_regression_model_and_data()
# This loads a trained linear regression model and its associated training and testing data.

# Load the admission prediction dataset
df = load_admission_prediction_data()
# This loads the admission prediction dataset for analysis.

visualize_advanced_regression_metrics(y_test['Chance of Admit'], model.predict(X_test))
# This function visualizes advanced regression metrics based on the actual target values ('Chance of Admit') and the model's predictions (model.predict(X_test)).

This code empowers users to perform an in-depth analysis of a regression model's performance, including visualizing residual distributions and feature importances, which can be crucial for model understanding and improvement.